In cpprb
version 8 and newer, you can store any number of environments (aka. observation, action, etc.).
For example, you can add your special environments like next_next_obs
, second_reward
, and so on.
These environments can take multi-dimensional shape (e.g. 3
, (4,4)
, (84,84,4)
), and any numpy data type.
__init__
In order to construct replay buffers, you need to specify the second parameter of their constructor, env_dict
.
The env_dict
is a dict
whose keys are environment name and whose values are dict
describing their properties.
The following table is supported properties and their default values.
key | description | type | default value |
---|---|---|---|
shape | shape (size of each dimension) | int or array like of int |
1 |
dtype | data type | numpy.dtype |
default_dtype in constructor or numpy.single |
add
When add
-ing environments to the replay buffer, you have to pass them by keyword arguments (aka. key=value
style). If your environment name is not a syntactically valid identifier, you can still create dictionary first, then unpack the dictionary by **
operator (e.g. rb.add(**kwargs)
).
sample
sample
returns dict
with keys of environments' name and with values of sampled ones.
from cpprb import ReplayBuffer
import numpy as np
buffer_size = 32
rb = ReplayBuffer(buffer_size,{"obs": {"shape": (4,4)},
"act": {"shape": 1},
"rew": {},
"next_obs": {"shape": (4,4)},
"next_next_obs": {"shape": (4,4)},
"done": {},
"my_important_info": {"dtype": {np.short}}})
for _ in range(100):
rb.add(obs=np.zeros((4,4)),
act=1.5,
rew=0.0,
next_obs=np.zeros((4,4)),
next_next_obs=np.zeros((4,4)),
done=0,
my_important_info=2)
rb.sample(64)
priorities
, weights
, and indexes
for PrioritizedReplayBuffer
are special environments and are automatically set.
Internally, these flexible environments are implemented with (cython version of) numpy.ndarray
. They were implemented with C++ code in older than version 8, which had trouble in flexibilities of data type and the number of environment. (There was a dirty hack to put all extra environments into act
which was not treat specially.)